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evoked  potential  components  is  described.  The  results  of  the  study  indicate 
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to  the  evoked  potentials  themselves  and  do  not  result  from  interference  by 
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RESEARCH  OBJECTIVES 

The  primary  research  objective  of  this  project  is  to  develop  and  evaluate  methods 
of  measuring  the  parameters  of  single  evoked  potential  (EP)  waveforms.  Currently  the 

t 

research  falls  into  the  following  two  areas. 

a.  Investigation  of  pattern  recognition  procedures  for  discriminating  among  various 
evoked  potential  waveforms. 

b.  Investigation  of  preprocessing  and  filtering  techniques  to  provide  improved 
waveform  estimation. 


STATUS  OF  RESEARCH  EFFORTS 

Significant  results  have  been  obtained  in  the  areas  of  research  relating  to  this  pro¬ 
ject.  These  are  briefly  described  in  the  following  paragraphs  and  are  further  described 
in  the  publications  resulting  from  the  research. 

Feature  Selection  for  Automatic  Classification.  It  has  been  shown  by  simulation 
and  experiment  that  it  is  possible  to  effectively  classify  EP  waveforms  into  categories 
corresponding  to  the  stimuli  that  produced  them.  Two  problems  associated  with  this 
procedure  are  selection  of  the  best  features  from  the  data  set  to  use  in  making  the 
classification  and  the  errors  that  result  from  interference  due  to  the  ongoing  EEG. 
Both  of  these  problems  have  been  addressed  as  part  of  this  research  project  and  pro¬ 
cedures  developed  to  mitigate  their  effects  on  system  performance. 

In  the  classification  procedures  being  utilized  here  the  features  upon  which  a  deci¬ 
sion  is  made  are  the  waveform  amplitudes  at  regularly  spaced  sampling  intervals. 
These  samples  are  taken  from  a  segment  of  the  EP  waveform  immediately  following 
stimulation  and  extending  up  to  as  much  as  500  ms.  The  sampling  interval  is  typically 
4  to  20  ms.  It  is  a  general  property  of  classification  procedures  of  this  type  that 
increasing  the  number  of  features  employed  in  the  classification  improves  performance 
up  to  a  certain  (relatively  small)  number  of  features  after  which  performance  begins  to 
decrease.  The  problem  is  to  select  the  best  subset  of  features  from  the  complete  set. 

Two  methods  that  have  been  widely  used  for  feature  selection  are  Forward 
Sequential  Feature  Selection  (FSFS)  and  Stepwise  Linear  Discriminant  Analysis 
(SLDA).  The  FSFS  algorithm  performs  a  classification  of  the  training  set  using  each 
feature  individually  in  a  sequential  manner.  The  feature  yielding  the  lowest  error  rate 
of-  classification  is  then  selected.  The  next  step  is  to  combine  this  first  feature  with 
each  of  the  other  features  and  reclassify  the  training  set  to  pick  the  pair  of  features 
that  gives  the  lowest  error  rate.  This  process  is  continued  until  no  further  improve¬ 
ments  in  accuracy  are  obtained  by  adding  additional  features  or  until  the  desired 
number ^ featu res^ is  not  an  optimum  procedure  because  there 
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is  no  basis  for  assuming  that  because  one  feature  works  the  best  alone  that  that  feature 
paired  with  any  other  feature  would  be  the  best  two  features  for  classification  and  the 
same  thing  applies  for  larger  numbers  of  features. 

The  SLDA  algorithm  proceeds  in  much  the  same  manner  as  the  FSFS  algorithm 
except  that  the  criterion  used  to  select  the  best  feature  at  each  step  is  based  upon  a 
statistical  test  rather  than  a  computed  error  rate.  The  test  used  is  a  one  way  analysis 
of  variance.  The  SLDA  also  tests  for  loss  of  significance  of  any  of  the  features  already 
entered  and  thus  can  remove  variables  or  features  that  have  been  previously  selected. 
However,  in  the  applications  that  we  have  made  of  this  procedure,  previously  selected 
variables  have  not  been  removed  by  the  procedure.  This  procedure  is  also  a  nonop¬ 
timum  procedure  and  is  only  applicable  to  linear  discriminant  analysis. 

One  method  of  obtaining  an  optimum  feature  subset  is  to  test  all  possible  subsets 
for  their  performance  in  a  classifier.  The  difficulty  with  this  procedure  b  that  a  great 
amount  of  computation  b  required  in  order  to  test  every  combination  of  features  that  is 
available  from  a  large  set.  For  example,  selecting  five  features  out  of  a  set  of  27  would 
require  the  design  and  testing  of  100,000  different  classifiers.  However,  it  was  found 
that  by  writing  all  computer  subroutines  in  assembly  language  and  by  calculating  the 
discriminate  function  using  the  algorithm  for  regression  by  “leaps  and  bounds”*  the 
computation  could  be  made  feasible  for  linear  discriminant  analysis.  Using  this  pro¬ 
cedure  it  b  then  possible  to  compute  the  optimum  feature  set  and  to  compare  perfor¬ 
mance  with  the  optimum  feature  set  with  those  features  selected  by  the  FSFS  and 
SLDA  algorithms.  Also  some  additional  information  can  be  obtained  by  examining 
which  features  are  selected  by  each  procedure  to  see  what  commonality  exbts  among 
them. 

The  classification  performance  using  features  selected  by  each  of  the  three  tech¬ 
niques  was  measured  using  both  artificial  simulation  data  and  using  real  evoked  poten¬ 
tial  data.  The  simulated  data  was  generated  by  adding  known  detcrminbtic  signals  to 
random  noise  sequences.  The  noise  was  generated  as  a  Markov  process  having  various 
half-power  bandwidths.  The  signals  corresponded  to  average  evoked  potential 
waveforms  measured  experimentally  using  the  unexpected  event  paradigm  in  which  a 
letter  regularly  appears  but  b  inverted  in  a  random  manner  approximately  10%  of  the 
time.  An  experiment  of  this  type  b  the  source  of  the  real  data  that  b  considered  subse¬ 
quently.  Tests  were  made  using  none  of  various  bandwidths  ranging  from  4  to  25  hertz 
and  evaluation  was  made  by  training  and  testing  on  the  same  data  set.  Eighty  samples 
jit  each  class  were  used  in  the  simulation.  Three  linear  classifiers  using  features  selected 
by  FSFS,  SLDA,  and  Exhaustive  Search  Feature  Selection  (ESFS)  and  a  quadratic 
classifier  using  FSFS  were  evaluated. 

•Fsraivsl,  G.  M.  iad  Wilton,  R.  W.,  "Ref  rets  ton  by  Leapt  sad  Bounds,"  Tceknmetrict, 
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Table  1  shows  the  results  for  noise  bandwidtbs  of  8  hertz  and  for  signal-to-noise 
ratios  of  -3dB,  -6dB  and  -9dB.  For  this  case  the  noise  bandwidtbs  were  the  same  for 
the  two  classes  and  so  their  covariance  matrices  were  also  identical  for  the  two  classes. 
Tpble  2  shows  the  results  for  the  case  when  the  covariance  matrices  of  the  noise  are 
different  for  the  two  classes  as  a  result  of  using  a  noise  bandwidth  of  8  hertz  for  one 
class  and  14  hertz  for  the  other  class.  It  is  seen  from  Table  2  that  ESFS  has  the  same 
kind  of  improvement  in  performance  for  the  case  of  unequal  covariance  matrices  as  it 
did  for  equal  covariance  matrices  for  the  two  classes.  The  FSFS  procedure  with  a  qua¬ 
dratic  discriminant  function  seems  to  give  somewhat  better  performance  for  the 
unequal  covariance  matrices  than  do  the  linear  discriminant  functions. 

A  comparison  of  the  several  feature  selection  and  classification  procedures  using 
measured  evoked  potentials  was  also  carried  out.  Data  was  collected  from  four  human 
subjects  for  the  unexpected  event  paradigm.  A  sequence  of  letter  v’s  was  shown  to  the 
subject  with  a  random  occurrence  of  an  inverted  v  occurring  10%  of  the  time.  For 
each  subject  three  electrode  sites  were  used,  Cz,  Pz  and  Oz.  Feature  selections  and 
classifications  for  all  electrode  sites  were  then  carried  out.  Table  3  shows  the  results  for 
one  of  the  subjects.  As  in  the  case  of  the  simulated  data  the  ESFS  procedure  gave  a 
modest  improvement  in  classification  performance  over  the  other  linear  methods.  How¬ 
ever,  the  quadratic  discriminant  function  using  FSFS  in  a  number  of  cases  gave  better 
performance  than  the  ESFS  procedure. 

In  addition  to  the  classification  results  using  the  experimental  data  a  record  was 
also  made  of  the  features  actually  selected  by  the  different  procedures.  Table  4  shows 
the  features  that  were  selected  at  the  various  steps  for  the  same  subject  as  in  Table  3. 
In  most  instances  it  was  found  that  the  first  feature  selected  was  the  same  by  all 
methods  but  from  then  on  there  was  often  divergence  particularly  with  the  ESFS  algo¬ 
rithm.  Oftentimes  the  initial  feature  chosen  or  early  feature  sets  are  not  contained  in 
the  final  feature  sets.  An  example  of  this  is  shown  in  Table  4  for  electrode  Cz,  where 
none  of  the  features  selected  in  Steps  1  and  2  were  included  in  the  feature  set  obtained 
at  Step  5. 

The  general  conclusion  of  these  tests  is  that  the  exhaustive  search  feature  selection 
procedure  gives  a  modest  improvement  in  performance  of  the  classifier  for  both  the 
simulated  and  experimental  data.  In  almost  all  cases  features  selected  at  the  various 
steps  of  ESFS  did  not  contain  all  the  features  selected  in  previous  steps.  Features  were 
constantly  replaced  by  new  ones  when  moving  to  the  next  step.  This  was  never  the 
pase  for  SLDA  even  though  SLDA  contained  the  capabilit  y  of  removing  features  at 
each  step. 
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Table  1 
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Table  2 


.  Artificial,  unequal  cover.  Data  aeta  2e  and  Bu.  -3dB  S/N. 
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Table  4 


Real  data.  Subject  3.  Electrode  Oz.  - 
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Real  data.  Subject  3.  Electrode  Pz. 
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Time-Varying  Filter  In  the  measurement  and  analysis  of  evoked  potentials  it  is 
generally  known  within  what  time  epoch  the  signal  occurred.  However,  often  the  signal 
can  take  a  variety  of  forms  through  variations  in  latency  amplitude  or  even  occurrence 
of  particular  components  in  the  EP.  By  incorporating  all  known  apriori  information 
both  deterministic  and  probabilistic  into  the  processor  design  it  is  possible  to  obtain 
substantial  improvements  in  waveform  estimation  over  processors  not  utilizing  this 
information.  Because  of  the  signal’s  occurrence  in  a  particular  time  epoch  in  combina¬ 
tion  with  the  ongoing  EEG  the  resulting  waveform  is  a  sample  function  of  a  non¬ 
stationary  random  process  and  the  optimum  processor  takes  the  form  of  a  time-varying 
filter.  For  processing  sampled  data  the  filter  is  a  matrix  operator  whose  elements  are 
determined  from  the  known  or  assumed  parameters  of  the  underlying  random  processes 
associated  with  the  signal  and  the  ongoing  EEG.  The  filter  operator  is  selected  to 
minimize  the  mean  square  error  of  the  estimate  and  is  given  by  the  following  expression 

H  =  IL.K,-,' 

where  K„  is  the  cross  covariance  matrix  of  the  signal  and  data  and  K„,  is  the  covari¬ 
ance  matrix  of  the  measured  data. 

The  matrix  H  can  be  thought  of  as  an  operator  that  projects  the  data  vector  from 
the  high  dimensionality  measurement  space  into  the  low  dimensionality  signal  space. 
The  dimensionality  of  the  signal  is  determined  by  the  number  of  significant  eigen¬ 
vectors  of  the  operator  and  thiv  in  turn  depends  upon  the  covariance  matrices  them¬ 
selves.  To  a  first  approx  imatior  the  more  that  is  known  about  a  signal  the  lower  will 
be  the  dimensionality  of  the  signal  space.  For  example  if  the  signal  were  deterministic 
except  for  amplitude  the  signal  space  would  be  one  dimensional  and  only  the  signal 
waveform  would  appear  at  the  filter  output  regardless  of  the  input. 

The  matrix  operator  filter  design  can  be  carried  out  for  EP  waveforms  in  the  fol¬ 
lowing  manner.  From  an  ensemble  of  measured  waveforms  the  latency  corrected  aver¬ 
age  (LCA)  is  computed.  From  the  output  of  the  LCA  procedure  the  individual  com¬ 
ponents  in  the  waveform  are  identified,  their  shapes  determined  and  the  means  and 
standard  deviations  of  their  latency  variations  estimated.  From  this  information  the 
covariance  matrix  of  the  signal  can  be  determined  and  the  covariance  matrix  of  the 
measured  data  is  calculated  directly  from  the  data  set.  From  these  two  matrices  the 
filter  matrix  is  computed. 

Filters  have  been  designed  and  tested  using  both  simulated  and  measured  data. 
Figure  1  shows  an  example  of  filtering  simulated  data.  Figure  la  is  the  signal,  Figure 
lb  shows  samples  of  signal  plus  noise  and  Figure  1c  shows  the  filtered  versions  of  Fig¬ 
ure  lb.  A  filter  was  designed  for  EP  waveforms  obtained  using  a  checkerboard  visual 
stimulus.  Figure  2  shows  individual  measured  EP  waveforms  (dashed)  and  filtered  ver¬ 
sions  of  the  waveforms  (solid).  The  very  substantial  noise  reduction  performance  of 
this  filter  is  readily  evident  in  ••"•res  *  .jd  2.  The  performance  of  this  type  of  fitter 


Evolved  potential  waveforms  before  (dashed)  and  after  (solid) 
processing  by  the  time-varying  filter. 


for  quantitative  measurements  on  single  EP  waveforms  is  being  evaluated. 

Effects  of  Noise  on  Latency  Measurements  of  EP  Components.  Whenever  repeti¬ 
tive  measurements  of  evoked  potentials  are  made  it  is  fojnd  that  there  are  significant 
variations  in  the  amplitudes  and  latencies  in  the  individual  peaks  or  components  that 
are  present.  This  is  a  well  known  phenomenon  and  has  frequently  been  discussed  in  the 
literature.  Figure  3  shows  a  histogram  of  the  latencies  of  peaks  identified  in  100  visual 
evoked  potentials  elicited  by  a  flash  stimulus  for  a  subject  with  eyes  closed.  It  is  seen 
from  the  figure  that  the  measured  latencies  are  distributed  around  a  mean  value  in  a 
manner  suggesting  that  a  certain  amount  of  randomness  is  associated  with  their 
occurrence. 

There  are  two  possible  causes  for  the  randomness  associated  with  measurements  of 
this  kind.  First,  the  measurements  of  EP’s  are  made  in  the  presence  of  the  ongoing 
EEG  which  may  significantly  affect  the  shape  of  the  observed  waveform  and  alter  both 
latency  and  amplitude  estimates.  Second  the  EP  waveform  itself  may  be  varying  from 
stimulus  to  stimulus.  It  is  important  for  several  reasons  to  be  able  to  differentiate 
between  these  two  effects.  One  reason  is  to  better  define  the  characteristics  and  param¬ 
eters  associated  with  the  single  EP.  Another  is  to  provide  quantitative  information  for 
the  design  of  improved  signal  processors  for  analysis  and  classification  of  single  EPs.  It 
is  this  latter  reason  that  led  to  the  research  described  here  which  is  aimed  at  establish¬ 
ing  the  degree  to  which  the  presence  of  the  ongoing  EEG  considered  as  an  additive 
noise  component  can  affect  the  latency  measurements  of  components  in  the  EP. 

This  problem  was  attacked  theoretically  and  the  resulting  analytical  expressions 
then  checked  empirically.  The  analysis  was  carried  out  as  follows.  It  was  assumed  that 
an  EP  waveform  could  be  approximated  in  the  vicinity  of  a  peak  by  a  second  order 
polynomial.  The  coefficients  of  the  polynomial  were  determined  by  means  of  a  least 
squares  fitting  of  5  points  in  the  vicinity  of  a  peak.  The  errors  in  estimating  these 
parameters  in  the  presence  of  noise  were  also  determined  theoretically.  Once  the 
parameters  are  known  the  peak  location  is  readily  found.  The  variance  in  the  peak 
location  estimate  due  to  the  presence  of  noise  can  be  determined  from  the  errors  that 
occur  in  the  parameter  estimates.  A  simulation  was  performed  to  verify  the  theoretical 
results.  Noise  was  added  to  a  known  waveshape  and  the  peak  location  determined. 
This  was  repeated  many  times  and  the  variance  of  the  peak  location  determined.  Tests 
were  carried  out  utilizing  both  white  noise  and  noise  having  the  same  covariance  as 
.measured  EEG  signals.  It  was  found  that  white  noise  gave  a  slightly  higher  variance  in 
the  component  latency  than  did  the  EEG  noise.  However,  the  check  in  both  cases 
between  the  experimentally  determined  values  and  the  theoretical  values  was  quite 
close. 

Latency  variation  or  jitter  due  to  noise  depends  primarily  upon  two  parameters: 
the  signal-to-noise-ratio  and  the  radius  of  curvature  of  the  peak.  Narrow  peaks  and 
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high  signal-to-noise  ratios  reduce  the  latency  jitter  produced  by  additive  DQise.  The  fol¬ 
lowing  expression  for  the  standard  deviation  of  latency  jitter  was  obtained. 

0.88RWn  ' 

where  R  is  the  radius  of  curvature  of  the  peak,  Wn  is  the  noise  bandwidth  and  SNR  is 
the  signal-to-noise  ratio  computed  as  the  square  of  the  peak  signal  amplitude  divided 
by  the  variance  of  the  noise.  This  equation  is  valid  for  any  sampling  frequency  equal  to 
or  greater  than  2Wn.  A  convenient  pulse  shape  for  computation  purposes  is  a  single 
lobe  of  a  cosine  wave.  If  the  duration  of  this  pulse  is  T  seconds,  then  the  radius  of  cur- 

vature  is  ( — )2.  To  illustrate  the  nature  of  the  results  consider  a  case  in  which  an  EP 

JT 

component  has  a  duration  of  30  ms,  the  noise  bandwidth  is  25  Hz  and  the  SNR  is  OdB. 
The  standard  deviation  of  the  latency  jitter  for  this  case  would  be  2  ms.  Values  deter¬ 
mined  from  the  LCA  are  typically  7  to  10  ms  for  measured  KPs  having  similar  parame¬ 
ters.  This  indicates  that  the  variations  in  latency  must  be  almost  entirely  due  to  varia¬ 
tions  in  the  latency  of  the  components  themselves  and  not  due  to  the  effects  of  additive 
noise.  These  results  add  considerable  confidence  to  the  design  procedures  for  the 
improved  signal  processing  techniques  that  make  use  of  the  random  variations  in  signal 
latency. 
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